One of the largest challenges facing the Motorola (Austin, TX) video communications design team today is verification. Employing emulation in addition to traditional simulation for our latest chip was a key part of us avoiding multiple design cycles through manufacturing. The challenges in creating a prototype system underline the need to consider not just what features the chip would perform, but also how to control and communicate with the chip. We created, with an Aptix (San Jose, CA) MP4 emulation board populated with FPGA's and memory modules, several other pieces of equipment and custom development boards, including a video communications system environment. This environment worked to test the chip design in a real manner with user interaction, thus we didn't need to rely entirely on simulation vectors. In addition to verification, emulation added benefits not originally foreseen for software development and silicon power-up. All of these factors combined to provide a design methodology we believe begins to answer the verification challenge.
The challenge
The Motorola MC149573 (code name: Cougar) chip provides standards-based motion video encoding and decoding for real-time video conferencing. This chip supports International Telecommunication Union (ITU) standards H.261 and H.263 video encode/decode for performance up to 30 frames per second at common intermediate format (CIF) resolution. Cougar performs at bit-rates ranging from plain-old-telephone-system (POTS) up to 4 Mbits per second. Its adaptive rate control allows video conferencing and telephony to take advantage of the broadband communication mediums now emerging such as xDSL, or cable modem, as well as ISDN and conventional modems. The Cougar chip uses an external Host processor to provide the communications front end. This chip can then be used with any of these communications channels. There are additional display features to be controlled by individual user preferences such as a picture-in-picture (PIP) window to display self-view, a downloaded bitmap image, or a still image.
All these features and user preferences are software programmable at any time while the chip runs.
Verification was a significant challenge for the Cougar chip given the variety of parameters and the feature set. The Cougar device continuously adjusts to the various channel rates, bitstream errors, and a wide array of user preferences. These preferences range from quality versus frame rate tradeoffs (for lower bandwidth environments) to display options such as the location of the PIP window. Cougar is fully synthesizable utilizing hardware description language (HDL) code. For initial chip integration and debug, simulation alone was used to verify Cougar. Once Cougar's core functionality existed, more extensive verification was needed. Simulating the entire chip HDL code for 50 video frames takes over a week and over two weeks after synthesis for gate-level simulations. Only a finite number of feature combinations can be tested in simulation because of this time constraint. Additionally, simulation doesn't account for the asynchronous nature of the external communications such as the Host processor. There was a need not just to accelerate the simulation vectors, but also have the freedom to change features in real-time and verify the asynchronous nature of the entire system where Cougar resides.
The emulation approach
Emulation offered an environment to verify Cougar that was very different from simulation. It allowed real hardware to replace the models for external interfaces.
It also provided an opportunity to synthesize the HDL code and run Cougar in hardware while retaining more debug capability than silicon allowed. Emulation allowed hundreds of video frames to be processed in minutes rather than weeks or months. Finally, it provided the opportunity to verify an entire system setup, including user interaction. We ran Cougar's emulation environment with scaled-down clocks on all external interfaces as well as a scaled-down internal clock. That way, speed and propagation delay between FPGA's wasn't a factor. Additionally, HDL modifications were minimized so there would be little or no difference between the HDL on the prototype and the HDL on the silicon. This was a critical issue since the goal was design verification and risk reduction for the silicon. If the HDL code differed significantly, the emulation effort wouldn't verify the same design as the design in silicon, and would reduce the benefits derived from emulation.
The Aptix MP4 board for prototyping pieces of the design was crucial in the emulaion phase of designing our chip. The Aptix MP4 board provides a hardware platform to build the prototype system. The MP4 board allows a variety of FPGA types and other component daughter cards to mount directly to the main backbone in a manner similar to a breadboard. Each FPGA's programming can be re-programmed as the design evolves without modifying the other FPGA's on the board. Unlike a conventional breadboard, the MP4 doesn't have hardwired interconnects between the daughter cards. Field programmable interconnect components (FPICs) provide the programmable interconnect between the FPGA modules, as well as connections to a logic analyzer for debug. With this flexibility, it isn't necessary to physically modify or re-fabricate a PCB as the design changes.
The Cougar chip consists of ten modules connected by three centralized buses (see Figure 1). Cougar uses a modular approach with a standard interface for each module. The modules communicate with each other over the centralized buses; emulation echoed that same structure. On the emulation board, each module used one FPGA. This allowed module modifications as each design changed without affecting the place and routes and timing of the other modules. The centralized buses communicated almost all the internal data and control between modules without adding signal pinouts to the designs. The only issue in using one FPGA per module was the capacity of the MP4 board. Cougar's ten modules plus a separate clock-control module required a total of eleven FPGA's on the MP4 board. Eleven FPGA's, however, exceeded the capacity of an Aptix MP4 board. Adding a second board wasn't a viable option.
In response to this issue, Aptix designed adapters that allowed the FPGA's to be mounted in a vertical fashion on the MP4 board. Although this significantly reduced the pin count available on the individual FPGA's, it increased the capacity on the board. On the vertical adapters, there are additional wing connectors that can interface to external equipment without using the MP4 board itself. These wing connectors presented an ideal mechanism for connecting our off-chip interfaces.
A complete video communications system is application dependent. Cougar's target application system includes set-top boxes, PC cameras, and stand-alone videophones both wired and hand-held. There are, however, components that are common to all applications using the Cougar chip (see Figure 2). There is a camera, a display (television or LCD panel), a host processor, and external memory. The main system clock is a 66-MHz clock provided by an on-board programmable phase locked-loop (PLL) driven by a 20-MHz external clock.
Between the camera and the Cougar chip is an NTSC/PAL decoder. This chip receives the analog signal from the camera and converts it to a digital signal formatted to either CCIR-601 or CCIR-656. The CCIR-601 formatting requires an 8-bit data bus and three additional signals. One of those signals is the 27-MHz video input pixel clock. The two remaining signals provide the horizontal and vertical timing for the 8-bit data bus. The CCIR-656 format is similar to the CCIR-601 format. There is an 8-bit data bus with a 27-MHz clock. However, the vertical and horizontal timing codes are embedded within the data stream. The 27-MHz clock is completely asynchronous to the main 66-MHz system clock.
The interface between the Cougar and the display requires a similar conversion chip. An NTSC/PAL encoder receives the digital data from Cougar and converts it to the signals needed to drive the television or LCD panel. The digital data can again be formatted for either CCIR-601 or CCIR-656. The NTSC/PAL encoder provides the 27-MHz video output clock, and the horizontal and vertical timing signals for CCIR-601. In CCIR-656, the Cougar provides the horizontal and vertical timing signals embedded in the data stream. The NTSC/PAL encoder still provides the 27-MHz video output clock. As with the video-input clock, the 27-MHz video output clock is completely asynchronous to the main 66-MHz system clock. The video input clock and video output clocks are asynchronous even though they run at the same frequency.
The Host processor interface is a parallel interface, which provides local user control and provides channel communications for the Cougar video processor by sending and receiving encoded video data. The Host retrieves the locally encoded video sending it across the channel to the remote end. It also transmits the received video data to Cougar for decoding and display. It's a memory-mapped SRAM-like interface with a 6-bit address bus, and 16-bit bi-directional data bus.
Cougar sends and receives data based upon three level-sensitive control signals, a chip select line, a read line, and a write line. This is an asynchronous interface by design. Synchronization with the main system clock is a significant concern with this interface.
The external memory provides intermediate storage for the video-encode, and video-decode processes on the Cougar chip. The Cougar communicates with an industry-standard 132-MHz SDRAM to fulfill this requirement. The interface runs at 2x the internal 66-MHz system clock. There is no asynchronous issue with the SDRAM bus as there is with the other three external interfaces. Clock skew, or phase, between the memory clock and main system clock is the timing concern here.
Communications prototype system
The first challenge in developing the prototype system was how to replace the external chips, described above, which are used in a Cougar system so that the prototyping environment was real (see Figure 3). Each of the interfaces required that a different issue be resolved. The first external interfaces evaluated were the video input (NTSC/PAL decoder) and video output (NTSC/PAL encoder). The Cougar chip requires continuous motion video input from the NTSC/PAL Decoder and provides continuous motion video output to the NTSC/PAL encoder. The external chips don't run at less than real-time. The solution was a piece of equipment that could record from a camera input, or read from a video test pattern, and play the sequence in step fashion one frame at a time or in a continuous infinite loop. This equipment also recorded video sequences from the Cougar-prototype and played the sequences back on a monitor. It could play and record the video data at a clock speed provided by the user since internal memory stored the video sequences.
This equipment became a large frame buffer for the video images to provide the camera input and capture the television output. The frame buffer interface was emitter-coupled logic (ECL), not transistor transitor logic (TTL). A printed circuit board (PCB) connected the emulator by cabling into the FPGA adapter wing connectors for both the video input and video output. The PCB converted the FPGA TTL signals to frame buffer ECL signals, and vice versa. With cabling involved, the signals to and from the FPGA's were buffered and passive termination provided to avoid signal integrity and noise issues. This frame buffer also provided the scaled down versions of the video input and video output pixel clocks that the NTSC/PAL Decoder and NTSC/PAL encoder chips normally provided.
The second interface required was the Host processor. The previous system design utilized a Motorola MC56309 digital signal processor (DSP) as the Host processor. Cougar's system design included the same DSP. This DSP, however, couldn't be attached directly to the MP4 board. A development board already existed using the DSP and the previous generation video communications processor, Motorola MC149570 (code name: Cheetah). This existing development platform was modified for Cougar prototyping by adding a connector. The connector allowed the DSP to control either the Cheetah chip on-board or the Cougar chip.
The FPGAs cabled directly between this new connector and the vertical adapter wing connectors. Additional buffering for the FPGA interface ensured good signal integrity across the cable in both directions.
The final interface that we needed to consider was the external memory interface. The SDRAM chips were capable of running at slower than real-time (132 MHz). There were also adapters readily available to mount the SDRAM chips directly to the MP4 board. This allowed the memory interface to test the control and refresh scheme on the real SDRAM chips instead of a model or a created module.
Prototyping issues
Once the external interfaces were implemented, there were additional internal hardware issues to modify for the Cougar prototype system. The main issue was the on board buses. As shown in Figure 1, there are three main buses communicating between all the modules. These buses consist of a 7-bit bus driven by an on-board module, 1-bit tri-state signal, and a 36-bit tri-state bus. Two of the buses (7-bit and 1-bit) routed through a special "bus" network on the MP4 board, which minimized signal routes. Because the vertical adapters reduced the pin count available on the FPGA module, there weren't enough pins to include the 36-bit bus in the "bus" network. Instead, a daisy-chain cable connected to the vertical adapter wing connectors for each FPGA. This cabling provided a direct connection for the 36-bit bus, but in the process added a significant amount of capacitance. To alleviate the added capacitance, four individual daisy-chain cables were used with connection boards between the cables. These connection boards provided direct signal connectivity with sockets for inline pull-up resistors. Three separate boards allowed four daisy chain cable segments to provide a continuous bus with the possibility of three pull-up resistors in parallel per signal. Of the three connection boards, only one board needed the pull-up resistors populated to provide the timing required on the 36-bit tri-state bus. The pull-up resistors on the other two connection boards remained unpopulated. Additional changes for the prototype system included slowing the DSP and memory clocks to a fraction of their normal operating speed and increasing the wait states for the DSP.
Once the hardware issues were solved, there were HDL changes between the silicon code and the FPGA code that were required. The SDRAM chips required the same refresh timing in the prototype as they did in the original system. With a slower system clock speed, however, there were fewer system clock cycles to satisfy the correct number of refresh cycles. To compensate for the slower clock speed, the memory-control module changed to provide the needed number of refresh cycles. This code modification was performed only on the emulation code and not on the silicon memory-control module. One HDL modification needed for each module was replacing the silicon SRAM models with the FPGA SRAM models. Every Cougar module contains internal "scratchpad" SRAM models to provide storage for intermediate results. Since these SRAM models are instantiated in the silicon code, the FPGA version must provide its own version of the SRAM model. The FPGA library provided SRAM configurations that behaved in the same way as the silicon library SRAM's. These FPGA memories were generated and instantiated in memory wrappers for each module. The memory wrapper provided a pin-out identical to the silicon library SRAM models so the HDL code controlling the SRAM's didn't need to be modified. The FPGA memory wrappers replaced the silicon SRAM's directly.
After the hardware platform and HDL changes were in place, the DSP device driver was developed. A GUI was written for a PC. This GUI allowed access directly to the Cougar Host processor interface as well as pre-programmed options. In this way, the testing verified pre-defined default settings, such as for a typical ISDN call, in addition to individual user preferences, such as PIP location. To facilitate debug, logic analyzers connected to the main system buses, DSP interface, video input, video output, and SDRAM interface using both the MP4 debug feature and direct connections to needed signals. With the hardware, software, and debug platforms complete, verification began in earnest.
Unearthing additional issues
The verification effort revealed several design issues not previously known to us which required HDL code changes. These issues fell into three categories:
1. Issues tested by simulation, which could have been caught. These were problems caused by incomplete interface models, or were problems that could have been caught had the simulation been run in a different way.
2. Issues not tested in simulation. There were items not tested in simulation due to time constraints and because the emulation system tested them better than a simulation environment could.
3. Problems caught because the emulation system used asynchronous external interfaces just like the real system did. These issues had passed simulation and failed in emulation because the user requested a change that wasn't synchronized with the on-chip timing.
The first category of issues could be fixed with more thorough modeling and more thorough simulation environment. Category two issues were expected in emulation because the system complexity needed to test the features that didn't exist in the simulation environment. The third category of issues presented the most compelling support for emulation in addition to simulation. Although not impossible to catch in simulation, the third category required several events to happen at particular times and in specific order to be caught. The probability of these events happening in simulation, so that the bugs would be unearthed, was very small. These third category issues required code changes and re-synthesis. Had they been fabricated into the silicon, they would have required an all-layer change to fix or a feature set reduction.
In addition to verification, the emulation system aided in software development. While the Cougar chip was in fabrication, the production software driver was developed on the DSP using the emulator as the Cougar chip. With the logic analyzer connections, it was easier to see exactly what commands were programmed and how the "chip" reacted at a level that wouldn't be possible once silicon replaced the emulator. The production device driver tailored for the end system application was written, tested, and integrated into the entire DSP stack by the time Cougar came back from fabrication.
The emulation effort also assisted silicon debug. With the platforms developed, silicon power-up was very clean. We removed the cable connecting the DSP development board to the emulator. Then, a specially designed daughter card containing Cougar silicon was attached to the same connector. From the software perspective, the lab DSP device driver needed modification to control the real NTSC/PAL decoder and encoder. With the hardware and software in place, silicon debug was underway within a day of attempting power-up on the lab bench. As a result of the entire prototyping effort many lessons we learned that will improve our simulation and emulation strategies for our next-generation MPEG-4 chip currently in development.
Cinda Flynn is a member of the Image Capture digital design team in Motorola's Semiconductor Products Sector.